Audio-visual quality as combination of unimodal qualities: environmental effects on talking heads
نویسندگان
چکیده
Introduction Talking heads provide a multimodal output component for human-computer-interfaces. They consist of facial visual models that are synchronized with speech synthesis modules concerning speech articulation. Due to their reduction to a human head or upper body, articulation is often more clearly visible compared to a full human body due to the possibly bigger display of the head. Therefore, talking heads are especially suited for applications like robust speech understanding and language acquisition. Evaluation is typically concerned with function test to assess the synthesis quality with e.g. metrics like word error rate of human listeners or perceived naturalness (cf. [8]). But as more and more talking heads are used as interfaces for speech-based dialogue systems and are enhanced with facial expressions, the overall quality experienced by the user is in scope.
منابع مشابه
Quality of talking heads in different interaction and media contexts
We investigate the impact of three different factors on the quality of talking heads as metaphors of a spoken dialogue system in the smart home domain. The main focus lies on the effect of voice and head characteristics on audio and video quality, as well as overall quality. Furthermore, the influence of interactivity and of media context on user perception is analysed. For this purpose two sub...
متن کاملAudio-Visual Prosody: Perception, Detection, and Synthesis of Prominence
In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experim...
متن کاملInvestigating Communicative Feedback Phenomena across Languages and Modalities
This thesis deals with human communicative behaviour related to feedback, analysed across languages (Italian and Swedish), modalities (auditory versus visual) and different communicative situations (human-human versus human-machine dialogues). The aim of this study is to give more insight into how humans use communicative behaviour related to feedback and at the same time to suggest a method to...
متن کاملMultimodal Speech Synthesis
Multimodal Speech Synthesis (’<Talking Heads”) encompasses synthesis of speech from text (“Text-toSpeech”, TTS) plus synthesis of a visual presentation of a face that is lip-synced to the generated audio (“Visual TTS”, VTTS). Talking Heads are now practical because of the ever-increasing computing power and falling prices of computer hardware. This paper highlights recent technological breakthr...
متن کاملA comparison of German talking heads in a smart home environment
The authors describe a newly developed German Text-Toaudiovisual-Speech (TTavS) synthesis system based on the English speaking HeadZero. Targets of the control parameters of the talking head are generated by mapping of German phonemes to the originally English visemic blend shapes controls. The resulting German version of HeadZero and the German talking head MASSY were extended to generate audi...
متن کامل